Skip to content

tls: Make session resume key shared across credentials builders creds #2709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

elcallio
Copy link
Contributor

@elcallio elcallio commented Apr 1, 2025

Fixes #2708

Session resume key (if enabled) was only generated/kept on credentials level. This means that access patterns where we create a new such object per shard, using same builder, would have different session keys, and thus not be able to resume each others sessions.

This moves key generation to builder level and re-uses this for all generated server credentials. Note that a reload of any file (in reloadable) will re-create the key (though it should become invalid anyway, but...).

Add tests for this, and a hefty warning that session keys are shared, should anyone try to re-use a builder for disparate credentials.

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me although I have not been able to test it yet myself. I understand that @dkropachev was able to test that it fixes the right problem (sessions were not being reused on a multi-shard applications, but were reused on a single shard), but @dkropachev - were you able to test that this specific patch fixes the problem on multi-shard runs?

@elcallio I couldn't understand where exactly in the code do we have a cross-shard cache of these session keys, if this is done in a safe multi-threaded way, and when do we ever drop old session keys from this cache, but I'm sure you thought about it already and it works well :-)

@nyh
Copy link
Contributor

nyh commented Apr 2, 2025

I think I can merge this patch already, but since you specifically asked @xemul and @tchaikov to review, I'll give them the courtesy of being able to review, if they want.

@elcallio
Copy link
Contributor Author

elcallio commented Apr 2, 2025

Not a cross shard cache, but a pre-computed key in the credentials builder. This is applied to all credentials created with this builder (or a copy of it).
The usage pattern is

  • create builder (shard 0)
  • call each shard with builder, create credentials used for TLS sessions. They now all use the same key

Session keys are re-generated every time you set the session resume mode, or every time a reloadable credentials do a reload.

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm apart from a nit.

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines 719 to 730
void tls::credentials_builder::set_session_resume_mode(session_resume_mode m) {
_session_resume_mode = m;
if (m != session_resume_mode::NONE) {
gnutls_datum key;
gtls_chk(gnutls_session_ticket_key_generate(&key));
_session_resume_key.assign(key.data, key.data + key.size);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, If we want to tls renegotiation to work even when node is being restarted we need _session_resume_key to be persisted either in the config or as a file or any other way.

Which means we will need another API that could feed pregenerated _session_resume_key to the credentials builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is unnecessary. If you have node restarts so frequent it impacts TLS session management, you have a bigger problem.
Should the need arise, get/set API:s for this can be added, but for now I'd say it is overreach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is unnecessary. If you have node restarts so frequent it impacts TLS session management, you have a bigger problem. Should the need arise, get/set API:s for this can be added, but for now I'd say it is overreach.

It is not about frequency, it is about of amount of clients that are trying to reconnect at the same time.
When client is reconnecting it establishes TLS connection first and the go thru CQL initialization process, CPU pressure of having thousands of clients reconnecting at the same time could be so high that it impedes CQL initialization process, as result client breaks connection and reconnects again.
Though, on second reconnect, it should have TLS ticket learned, so it should go smoother.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that a restart should be infrequent. Connections can use session resume across the nodes uptime or the ticket expiry, whichever comes first (lets hope the latter). But also remember that storing the session key across restarts would more or less require them to be clean restarts (because we do want to refresh the key at least on cert reload etc), which, in the case of crashes, makes it even less useful.
Also remember that using the same key for to long is a security concern, and already the cross-session usage here is somewhat dubious (but since we can treat cross-shard instances as equivalent to a "normal" server single session ok).

size = 0;
gnutls_datum(void* d = nullptr, size_t s = 0) {
data = reinterpret_cast<unsigned char*>(d);
size = s;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is internal, it's still a dangerous constructor (esp. the non-explicit conversion from any pointer).

Please change to std::span<std::byte>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better solution is to make the constructor only take a size and do the alloc internally, as the memory must be allocated by gnutls_malloc. Better to enforce.

src/net/tls.cc Outdated
@@ -436,12 +436,20 @@ class tls::certificate_credentials::impl: public gnutlsobj {
client_auth get_client_auth() const {
return _client_auth;
}
void set_session_resume_mode(session_resume_mode m) {
void set_session_resume_mode(session_resume_mode m, const std::vector<uint8_t>& key = {}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here a span is more flexible than a vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a huge overkill for an internal call. But sure.

Fixes scylladb#2708

Session resume key (if enabled) was only generated/kept on credentials
level. This means that access patterns where we create a new such object
per shard, using same builder, would have different session keys, and
thus not be able to resume each others sessions.

This moves key generation to builder level and re-uses this for all
generated server credentials. Note that a reload of any file (in reloadable)
will re-create the key (though it should become invalid anyway, but...).

Add tests for this, and a hefty warning that session keys are shared,
should anyone try to re-use a builder for disparate credentials.

v2:
* Make gnutls_datum more container-like (i.e. encapsulated sized alloc)
@elcallio
Copy link
Contributor Author

elcallio commented Apr 7, 2025

ping?

@xemul xemul closed this in 6b76277 Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS session resume should share session key in certificate builder
5 participants